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Abstract: In grid networks, distributed resources are interconnected by wide area network 
to support compute and data-intensive applications, which require reliable and efficient 
transfer of gigabits (even terabits) of data. Different from best-effort traffic in Internet, 
bulk data transfer in grid requires bandwidth reservation as a fundamental service. Existing 
reservation schemes such as RSVP are designed for real-time traffic specified by reservation 
rate, transfer start time but with unknown lifetime. In comparison, bulk data transfer re- 
quests are defined in terms of volume and deadline, which provide more information, and 
allow more flexibility in reservation schemes, i.e., transfer start time can be flexibly chosen, 
and reservation for a single request can be divided into multiple intervals with different reser- 
vation rates. We define a flexible reservation framework using time-rate function algebra, 
and identify a series of practical reservation scheme families with increasing generality and 
potential performance, namely, FixTime-FixRate, FixTime-FlexRate, FlexTime-FlexRate, 
and Multi-Interval. Simple heuristics are used to select representative scheme from each 
family for performance comparison. Simulation results show that the increasing flexibil- 
ity can potentially improve system performance, minimizing both blocking probability and 
mean flow time. We also discuss the distributed implementation of proposed framework. 
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Un cadre flexible de reservation de bande passante pour 
les transferts massifs dans les reseaux de grille 

Resume : Dans les reseaux de grilles, les ressources distributes sont interconnectees par des 
reseaux longues distance pour executer des applications intensives de calcul ou de traitement 
de donnees, qui necessitent des transferts fiables et efficaces de volumes de donnees de l'ordre 
de plusieurs gigaoctets ou teroctets. Le transferts massifs dans les grilles, contrairement au 
trafic "best effort" de l'lnternet, requierent un service de reservation de bande-passante. Les 
schemas de reservation existants, tels RSVP, ont ete congus pour du trafic temps-reel et pour 
lequel on specifie un debit reserve, une date de debut de transfert mais on ne precise pas la 
duree. En comparaison, les transferts massifs de grilles sont definis en termes de volumes 
et de date limite, ce qui offre plus d'informations et autorise des schemas de reservation 
plus flexibles. Le debut effectif du transfert peut etre choisi, une reservation pour une 
meme requete peut etre divises en plusieurs intervals avec des debits reserves differents. 
Nous definissons un cadre flexible de reservation de bande passante a l'aide d'une algebre 
de fonctions temps-debit et identifions une serie de families de schemas de reservation, que 
nous nommons FixTime-FixRate, FixTime-FlexRate, FlexTime FlexRate, et Multi-Interval, 
presentant une generality et un potentiel de performance croissants. Des heuristiques simples 
sont utilisees pour selectionner un schema representatif dans chaque famille pour comparer 
les performances. Les resultats de simulation montrent que l'augmentation de la flexibilite 
peut potentiellement augmenter les performances du systeme, minimiser la probabilite de 
blocage et la duree moyenne des flux. Nous discutons aussi de Pimplantation distribute du 
cadre propose. 

Mots-cles : Reservation, grille, transferts massifs, flexibilite 
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1 Introduction 

Grid computing is a promising technology that brings together large collection of geographi- 
cally distributed resources (e.g., computing, storage, visualization, etc.) to build a very high 
performance computing environment for compute and data-intensive applications (Jj. Grid 
networks connect multiple sites, each comprising a number of processors, storage systems, 
databases, scientific instruments, and etc. In grid applications, like experimental analy- 
sis and simulations in high-energy physics, climate modeling, earthquake engineering, drug 
design, and astronomy, massive datasets must be shared by a community of researchers dis- 
tributed in different sites. These researchers transfer large subsets of data across network 
for processing. The volume of dataset can usually be determined from task specification, 
and a strict deadline is often specified to guarantee in-time completion of the whole task, 
also to enforce efficient use of expensive grid resources, not only network bandwidth, but 
also the co-allocated CPUs, disks, and etc. 

While Internet bulk data transfer works well with best-effort service, high-performance 
grid applications require bandwidth reservation for bulk data transfer as a fundamental 
service. Besides strict deadline requirement and expensive co-allocated resources as we 
discussed above, the smaller multiplexing level of grid networks compared to Internet also 
serves as a main driving force for bandwidth reservation. In Internet, the source access 
rates are generally much smaller (2Mbps for DSL lines) than the backbone link capacity 
(hundreds to thousands of Mbps, say). Coexistence of many active flows in a single link 
smoothes the variation of arrival demands due to the law of large number, and the link is 
not a bottleneck until demand attains above 90% of its capacity ^jl- Thus no proactive 
admission control is used in Internet for bulk data transfer. Instead, distributed transport 
protocols, such as TCP, are used to statistically share available bandwidth among flows in 
a "fair" way. Contrarily, in grid context, the capacity of a single source (c = lGbps) is 
comparable to the capacity of bottleneck link. For a system with small multiplexing level, 
if no pro-active admission control is applied, burst of load greatly deteriorates the system 
performance. 

A concrete example is given in Section [21 to demonstrate the importance of resource 
reservation for grid networks. Through the example, we also show that existing RSVP-type 
framework is not flexible enough for bulk data transfer reservation. In Sectional we define 
a flexible reservation framework using time-rate function algebra. Section Q] identifies a 
series of practical reservation scheme families with increasing generality, and we use simple 
heuristics to select representative scheme from each family. In Sectional simulation result 
of chosen schemes are presented and the impact of flexibility is analyzed. A distributed 
architecture is proposed in Section In Section we briefly review related works on 
bandwidth reservation. Finally, we conclude in Section 03 
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2 Motivation 

In Figure we simulate a single link with capacity C. Bulk data transfer requests arrive 
according to a Poisson process with parameter A. Request volume is independent of arrival 
time, and follows an exponential distribution with parameter p. Simulations with other 
arrival processes and traffic volume distributiones reveal similar trend, which are not pre- 
sented here for brevity. Load p = A/(C* p), Requests have maximal transfer rate R max - 
In Internet setting RmlZ net = C'/lOO, and in grid setting R^x = C/10. Ideal transport 
protocol is assumed, so that if there are no more than C/R max active flows, all of them 
transfer at full rate R ma x- If there are n > C/R max active flows, they all transfer at rate 
C/n. A request with volume v "fails" and immediately terminates, if it does not complete 
transfer within v/R m i n time, where R m in < Rmax is the expected average throughput of 
the request (in this example R m in — Rmax/2 for all requests). In Internet-NoAC setting 
(AC stands for "Admission Control"), the fail probability is low until load p attains above 
95%. In grid-No AC setting, however, the fail probability is nonneglectable even under a 
medium load, and it deteriorates rapidly as load increases. Thus we consider using a simple 
reservation scheme, which enforces requests to reserve R m in bandwidth when they arrive, 
so that all accepted requests are guaranteed to complete before deadline (fail probability is 
0). Requests are blocked if the number of active reservations reaches C/R exp . This kind 
of reservation can be supported by existing reservation schemes, for example, RSVP [H]- In 
grid- AC setting, we still assume ideal transport protocol, i.e., accepted requests are able to 
fairly share unreserved capacity in addition to their reserved bandwidth. Block probability 
of grid- AC setting is much lower than fail probability of grid-No AC setting. 




Figure 1: Fail/block probability under different multiplexing level 

In Figure m we also plot a variation of grid- AC setting in which flows can only use 
reserved bandwidth. With this dull transport protocol assumption, the link can be mod- 
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eled as a standard M/M/m/m queuing system with m — C / R m in = 10. Comparing this 
M/M/m/m setting against grid-NoAC setting, simple reservation scheme with dull trans- 
port protocol can still outperform no admission control setting with ideal transport protocol 
when load is relatively high. This again demonstrates the benefit of reservation. Meanwhile, 
the big performance gap between M/M/m/m setting and grid- AC setting shows that when 
transport protocol is dull, a RSVP-type reservation does not fully exploit the system's ca- 
pacity. The transport protocol design for high speed network is still an ongoing research. 
Complementarily, we consider how to improve system's performance by using more flexible 
reservation schemes in this paper. 

RSVP is designed for real-time traffic which normally requests for a specified value of 
bandwidth from a fixed start time. Their lifetime is unknown, thus reservation remains in 
effect for an indefinite duration until explicit "Teardown" signal is issued or soft state expires. 
In stead, bulk data transfer requests are specified by volume and deadline. This allows more 
flexibility in the design of reservation schemes. As volume is known, the completion time 
can be calculated by scheduler and kept in time-indexed reservation states. If there is not 
enough bandwidth at the moment a request arrives, transfer can be scheduled to start at 
some future time point as long as it can complete before deadline. Bandwidth reservation 
can also comprise sub-intervals with different reserved rates. 



Rate 
(Tbph) 



No One Multiple 

solution solution solutions 



Link 
capacity C 



g 7 Time 

(h) 



Figure 2: Flexible reservation schemes example 



Limitation of RSVP-type reservation for bulk data transfer is illustrated in Figure [2j 
In this example, we consider a link with capacity C — ATbph. Requests arrive online with 
varying volume, their maximal transfer rate is R ma x = 2Tbph and their minimum average 
transfer rate is R m in = ITbph. A request arrives at time t with volume v has a deadline 
t + v/R m i n . Assume at current time Oh, there are four active reservations each reserving 
lTbph bandwidth. Their termination times are known and marked in the figure. A new 
request arrives at Oh with volume v — 4Tb, and its deadline is Oh + 4Tb/R m i n — 4h. Since 
there is no bandwidth left at time Oh, this request will be rejected by RSVP-type reservation 
scheme. This unnecessary rejection can be avoided, if we use more flexible reservation scheme 
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and exploit the time-indexed reservation state information. A feasible reservation solution 
is to reserving lTbph for the request from time lft (other than from Oft) until 3ft, followed 
by a different reservation rate of 2Tbph until Ah. 

In the case of v — 4Tb, this is the only solution to accept the request and guarantee 
its successful completion without preempting any existing reservations. However, if the 
request has volume v = 2Tb and thus deadline 2h, no feasible solution exists to accept the 
new request unless preemption is allowed. The concept of preemption is borrowed from job 
scheduling literature, which means the modification (including teardown) of the reservation 
state of an already-accepted request by system. Compared to non-preemptive schemes, 
preemptive schedulers enjoy higher decision flexibility which implies potential performance 
gain. But they have some drawbacks including: 

• Dropping accepted request causes more dissatisfaction than blocking new one; 

• Dynamic change (QoS degration) of reservation state hurts service predictability, which 
is important because bandwidth is co-allocated with other resources. 

Also, it is challenging to design a distributed preemptive reservation architecture. In this 
paper, we focus on a non-preemptive reservation framework. 

There may be multiple feasible solutions to accept a request, for example if the request 
here is with volume v — 6Tb and deadline 6ft. The algorithm to select a solution out of 
all feasible solutions depends on the objective functions of reservation schemes. Besides 
increasing accept probability, there are other important performance criteria. Borrowing 
concept again from job scheduling, flow time is defined as the time between a request's 
arrival and its completion. For bulk data transfers, especially in grid applications, it is 
desirable to minimize flow time. Smaller flow time not only improves users' satisfaction, 
but also releases all co-allocated resources earlier back to sharing pool. Fairness among 
flows is also an important performance criteria. For example, bulk data transfer may define 
fairness over their average throughput. These criteria may be conflicting with each other. 
For example, the solution to minimize flow time here is to reserve lTbph from lh to 3ft., 
and 2Tbph from 3ft to 5ft so that the request can be finished at 5ft. While the solution to 
minimize peak reservation rate is to reserve YTbph from lh to 3ft, and 4/3Tbph from 3ft to 
6ft. Yet another reasonable solution is to reserve O.hTbph from 1ft to 3ft, l.hTbph from 3ft 
to 5ft, followed by 2Tbph (R m ax) from 5ft to 6ft, so that the remained bandwidth variation 
along time axis is minimized. 

It is very difficult (if not totally impossible) to identify the optimal solution in both 
off-line and on-line setting. Sometimes it is preferable to reject a request even when feasible 
solution exists. In this paper, we don't emphasis the choice of objective functions and 
optimal solutions. Instead, we focus on formalizing a flexible yet practical solution space, so 
that a potential candidate solution will not be missed because of the limitation in reservation 
framework flexibility. 
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3 Flexible reservation framework 
3.1 System model 

We model grid networks as a set of resources interconnected by wide area network. The 
underlying communication infrastructure of grid networks is a complex interconnection of 
enterprize domains and public networks that exhibit potential bottlenecks and varying per- 
formance characteristics. For simplicity, we assume a centralized scheduler manages reserva- 
tion state vector L for all links in the system. We will discuss the distributed implementation 
in Section 

We define a request as a 6-tuple: 



(i) 



As suggested by name, source s r requests to transfer bulk data of volume v r to destination 
d r . Request arrives at time a r and transfer is ready to begin immediately. Transfer should 
complete before deadline d r , and R™ ax is the maximum rate that request r can support, 
constrainted by either link capacity of end nodes, application or transport protocol. 




Constraint 
C r (t) 




System 




Decision 


State L(t) 




D r (t) 




<Plus> 





Figure 3: Reservation schemes algorithm framework 

A bandwidth scheduler makes decision for request based on system state L(t) and request 
specification r. As shown in Figure a scheduler first calculates constraint function C r (t) 
for the reservation, considering both request specification and current system state L(t). 
Calculation of constraint is a min operation over time-rate function which will be defined 
below. Constraint function C r (t) then is used to make reservation decision D r (t). D r {t) is 
the output of scheduler, and is also used internally to update link state L(t). 

3.2 Time-rate function algebra 

We denote the set of all time-rate functions as T, and we define Min-Plus algebra over T: 



(/i min f 2 ){t) = mw(/i(t),/ 2 (t)) 



(2) 
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(/i + /*)(*)= A (*) + / 2 (t) (3) 

While Min-algebra is a semigroup, Plus- algebra is a group with identity element f°(t) = 
0, Vt G (—oo, oo). We define < relation over T as: 



h < h, iff AW < h{t),Vt G (-00,00) 



(4) 



Note that T with < is a partial order set not satisfying comparability condition. 
a r ,d r , R™ ax in request specification determines a time-rate function, which can be viewed 
as the original constraint function imposed by request specification: 



C; equest {t) = R™ ax h(t - Or) - R™ aX h{t - dr) 



where: 



h(t) = 



1 t G [0, 00) 
otherwise 



(5) 



(6) 



is the Heaviside step function (unistep function). Translation of h(t) is indicator function 
for half-open interval. 

The constraint calculation stage shown in Figure is to consider both C^ equest (t) and 
system reservation state L(t), so that the resulted C r (t) returns the maximum bandwidth 
that can be allocated to request r at time t: 



C r (t) = (C r r equest min L\ min L 2 min . . . min L k ){t) 



(7) 



where we assume links L±, L2, ■ ■ ■ L k form path from source[r] to dest[r], and Li(t) is the 
time- (remained bandwidth) function for link L,. The min operation is illustrated in Figure 
21 with two links Li and L2 in request r's path: 




Figure 4: Calculate request's constraint function C r (t) 
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Reservation decision function D r (t) returns the reserved data rate for r at time t. If 
scheduler rejects the request, no bandwidth will be reserved for the request in the whole 
time axis. Thus rejection decision can be represented by f°. D r (t) satisfies: 

D r {t) < C r (t) (8) 

I D r (t)dt = v r , if A- + f (9) 

J t—a r 

In the system state update stage shown in Figure 

Li(t) = (Li- D r )(t),VLi G path of r (10) 

At time r, an empty link Li without any reservation has Li(t) = B[Li]h(t — r), where 
B[Li] is the total capacity of link L, ; . 

3.3 Step time-rate functions 

General time-rate functions are not suitable for implementation, thus we restrict our dis- 
cussion to a special class of time-rate functions, i.e., the step time-rate functions, which are 
easy to be stored and processed. 

Formally, a function is called step function if it can be written as a finite linear combi- 
nation of indicator functions of half-open intervals. Informally speaking, a step function is 
a piecewise constant function having only finitely many pieces. A time step function f(t) 
can be represented as: 



f(t) = ai h(t - h) + a 2 h(t - b 2 ) + ■ ■ ■ + a n h{t - b n ) (11) 

of all step functions as T s C T . A st 
points can be uniquely represented by a 2 x n matrix 



with elements in 



We denote the set of all step functions as T s C T . A step function with n non-continuous 

ai ... a ri 
bi ... b n 

first row non-zero, and elements in second row strictly increasing. All step functions 
with n non-continuous points form n-step function set J 7 " C J- s . J-^ = {/ }. J 7 } = 
{all translations of h(t)}. All non-regressive linear combination of two different elements in 
J 7 } form T^. For f 2 € if a\ + a 2 = 0, f 2 and / encompass a rectangular in time-rate 
coordinate. All such f 2 form the rectangular function set T rec . We also define general n-step 
function set Q n = T° U T 1 . . . T n . 

Following discussions restrict reservation schemes to make decision in step function form, 
i.e., D r £T S . For f n {t) 6 r s l and f m (t) £ rj 1 , it is easy to show that (/" rain f m )(t) S 
J-™ +m , and (/" + f m )(t) G T" +m , i.e., both min and plus operations are closed in T s , 
thus constraint function C r (t) and time- (remained bandwidth) function Li(t) are also step 
functions. The computation and space complexity for min, plus and order operations over 
function f n (t) and f m {t) are 0(n + m). We discuss calculation of D r (t) based on C r (t) and 
v r in next section. 
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Schemes 


accept decision 


flexibility 


FixTime-FixRate 


D r {t) = C r (t) 





FixTime-FlexRate 


D r (t) £ J-rec with term 
h(t — a r ) 


1 


FlexTime-FlexRate 


Dr(t) € Tree 


2 


Multi-Interval 


D r (t) G Qn 


2n-2 



Table 1: Reservation schemes 



4 Reservation schemes 

4.1 Schemes taxonomy and heuristics 

Existing RSVP-type reservation schemes only supports reservation of a fixed bandwidth 
from a fixed start time, which we name as FixTime-FixRate schemes. Slightly more general 
are FixTime-FlexRate schemes, which still enforces a fixed start time, but allow scheduler 
to flexibly determine the reservation bandwidth. To further generalize the idea, we have 
FlexTime-FlexRate schemes, which allows reservation starts from any time in [a ri d r ] and 
reserves any rate (but need to be constant) continuously until transfer completes. Finally, 
by allowing reservation comprise of multiple (n < 1) sub-intervals with different reservation 
bandwidths, we have Multi-Interval schemes. Regarding their solution space, FixTime- 
FixRate C FixTime-FlexRate C FlexTime-FlexRate C MultiRate. Their different flexibilities 
are summarized in Table 

The flexibility makes it hard to choose a suitable decision D r (t) if multiple candidates 
are available. As mentioned in Section[2l there are multiple performance criteria, increasing 
accept probability, minimizing flow time, and ensuring fairness among flows, just name a few. 
In fact, even for RSVP-type reservation scheme with only two choices (reject, or accept the 
request with fixed rate at fixed start time) , it is hard to make an optimal selection as proved 
in ^2] ■ Instead, we use simple heuristics to select representative scheme from each family for 
performance comparison. A threshold-based rate-tuning heuristic is used to choose candidate 
from FixTime-FlexRate schemes which will be detailed in Sectional Simple Greedy- Accept 
and Minimize-FlowTime heuristics are used to choose candidate from FlexTime-FlexRate 
family and Multi-Interval family. 

Greedy- Accept means: If there is at least one feasible solution to accept a coming request, 
the request should not be rejected. Greedily accept new request is not optimal in an off- 
line sense, because sometimes it maybe better to Early-Reject a request even when feasible 
solution exists, so that capacity can be kept for more rewarded-requests which arrive later. 
Despite this, it is an interesting heuristic to study, because: 

• Greedy- Accept heuristic can be used orthogonally with trunk reservation to mimic the 
behavior of Early-Reject' 
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• Greedy- Accept introduces a strict priority based on arriving order, which by itself is a 
reasonable assignment philosophy. 

Minimize- FlowTime means: If there are multiple feasible solutions in the solution space, 
the one with minimal completion time will be chosen. Besides the straightforward benefit on 
minimizing flow time, this philosophy also helps maximize the utilization of resource in near 
future, which otherwise is more likely to be wasted if no new request comes soon. However, 
since the near future is more densely packed with reservation, assuming all requests have 
identical R exp , then a small volume request with short life span is easier to get rejected 
than a large volume request with long life span. This unfairness can also be addressed by 
volume-based trunk reservation. 



4.2 FixTime-FixRate schemes 

In FixTime-FixRate schemes, request specifies its desired reservation rate. Scheduler can 
only decide to accept or reject. As shown in [J], reducing reservation rate increases system's 
Erlang capacity. Thus a candidate FixTime-FixRate scheme to maximize accept rate is to 
enforce: 

D = f R™ n (h(t - or) - h(t - dp)) if R™ in < C r {a r ) 
r \ f° otherwise ^ ' 

Here i?™ m = d v ^ a satisfy Equation In this scheme, every accepted request com- 
pletes transfer exactly at its deadline, if a dull transfer protocol is used. This is the reserva- 
tion scheme used in Figure^] Notice that for FixTime schemes without advance reservation, 
Equation JHJ is simplified to consider constraint function C r (£)'s value at a r only, because: 

• FixTime schemes' reservation is enforced to begin from a r ; 

• Under FixTime schemes without advance researvation, time- (remained capacity) func- 
tion Li(t) for any link Li is non-decreasing along time axis. 



4.3 FixTime-FlexRate schemes 

FixTime-FlexRate schemes still enforce transfer start at a r , thus D r (t) E T rec must have 
term h(t — a r ). Compared to FixTime-FixRate schemes, FixTime-FlexRate schemes can 
flexibly choose the rate parameter R r in D r (t). FixTime-FlexRate schemes allocate a single 
rate R r for accepted request r from its arrival time a r to its completion time a r + 

D r (t) = Rr(h(t - a r ) - h(t - a r - ^-)) (13) 

The second term above is calculated using Equation (O . While Equation (JHJ is simplified 
as: a r + jjp < d r thus R r > d v ^ a , and R r < C r (a r ) similar to FixTime-FixRate schemes. 
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4.4 FlexTime-FlexRate schemes 



FlexTime-FlexRate schemes relax the fix start time constraint. Thus, Decision Function 
D r (t) of FlexTime-FlexRate schemes can be any rectangular function satisfying Equation 
(HJ and O . FlexTime-FlexRate schemes allocate a single rate R r in interval [tf art , tf art + 
jF-] C [o r , d r ]. The D r can be fully characterizes by a pair (tf art ,R r ). Completion time is 
calculated using Equation ©■ 

To simplify Equation (HJ, we define constraint rectangular function set j:constramt 



Pareto optimal rectangular function set T, 



Pareto 



for constraint function C r (t): 



■constraint 



= {/(*)!/(*) G ^rec and /(t) < C r (t)} 



(14) 



Paretc 



= {/(*)!/(*) e 



constraint 



rec 
■constraint 
rec 



and 

,»(*)>/(*)} 



(15) 



Pareto optimal rectangular function set of a n-step constraint function C r (i) can be 
calculated in (9(n 2 ) as illustrated in Figure El J-^. reto contains 0(n 2 ) elements. 



i 
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Figure 5: Pareto Optimal Rectangular function set 



Apply Greedy- Accept and Minimize- Flow Time heuristics here: a request r is rejected, 
if and only if there is no f(t) 6 !F^ reto with integration no less than v r ; otherwise, all 
Pareto optimal rectanglar functions with large enough integration are checked to identify 
the one providing minimum flow time. Given a Pareto optimal rectangular function f{t) = 
ax(h — ti) — ai(h — t2), the minimum flow time it can provide is ti + ^. The implementation 
of this scheme is detailed in Table EJ 



4.5 Multi-Interval schemes 

Compared to all above schemes, reservation decision in Multi-Interval schemes can be com- 
posed of multiple intervals with different reservation rates. Note that Multi-Interval schemes 
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struct time-rate{ 
double time; 
double rate; 

boolean unVisited = true; 

}; 

Input: 6-tuple representation of request r and its constraint function C r (t), which is a n-step 
function represented by a time-rate vector v. For i€ [0, . . . , n — 1]: 
v[i].time is the (i + l) th noncontinuous points of C r (t), 
v[i].rate = C r {v[i]. time). 

Output: decision d in a time-rate structure. 

int nextlncrease(int i){ 

for(i++; i <= n; i++) 
if(v[i-l] < v[i]) 
break; 

return i; 

} 

int nextDecrease(int i){ 
if ( v [i] . unVisited) { 

v[i]. unVisited = false; 
double r = v[i].rate; 
for(i++; i < n; i++) 
if(r > v[i].rate) 

break; 
return i; 

} 

else 

return n; 

} 

struct time-rate reservation (request r, struct time-rate v[]){ 
struct time-rate d; 
d.time = r. deadline; 
d.rate = 0; 

for(int left = 0; left < n-1 && v[left].rate > && v[left].time < d.time; left = nextln- 
crease(left)){ 

double resv-rate = v[left].rate; 

for(int right = nextDecrease(left); right < n; right = nextDecrease(right)){ 
if(v[left].time + r. volume / resv-rate < d.time){ 
d.time = v[left].time + r. volume / resv-rate; 
d.rate = resv-rate; 
break; 

} 

resv-rate = v[right].rate; 

} 

} 

if(d.rate > 0) d.time — = r. volume / d.rate; 
return d; 

R|R}n° 0123456789 



Table 2: Greedy- Accept Minimize-FlowTime FlexTime-FlexRate schemes 
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are different from preemptive schemes. Although multiple rates can be used in Multi-Interval 
schemes, and flows are probably scheduled to transfer in two discontinuous intervals, this 
decision is determined at the moment the request arrives, and is not changed (preempted) 
after that. 



C r (t) 




o 



1 O ► 

a d time 

Figure 6: Multi-Interval schemes 

Apply Greedy- Accept and Minimize- Flow Time heuristics here, if integration of C r (t) over 
time axis is larger than v r : 

(A f C r (t) t<T f . 

A-(i) = j t>T (16) 

where time t satisfies: JT C r (t)dt = v r . D r (t) — f° if no such t exists. As shown in 
FigureEl when C r (t) € J 7 ™ is a n-step function, computational complexity of MR-MaxPack- 
MinDelay scheme is 0(n), and D r (t) € Q™. 

Sometimes it is useful to enforce D r {t) G Q" for a constant n. For example, FlexTime- 
FlexRate schemes are subset of Multi-Interval schemes enforcing D r (t) £ Ql . If reservation 
decision is allowed to be composed of at most two adjecent subintervals with different rates, 
it can be modeled as subset of Multi-Interval schemes enforcing D r (t) £ Q^. 
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5 Performance evaluation 

5.1 Simulation setup 

We use simulation to demonstrate the potential performance gain from the increasing flex- 
ibility. We consider the performance of both blocking probability and mean flow time for 
following schemes: 

• FixTime-FixRate-R max scheme is a FixTime-FixRate scheme with reservation rate of 

Fornax 5 

• FixTime-FixRate-Rmin scheme is a FixTime-FixRate scheme with reservation rate of 

• Threshold-FixTime-FlexRate scheme is a simple FixTime-FlexRate scheme which re- 
serves R m ax when the minimum unreserved bandwidth among all links along the path 
is above a threshold (set as 20% of link capacity in this simulation), and reservates 
Rmin otherwise; 

• Greedy- Accept and Minimize- Flow Time heuristic in the FlexTime-FlexRate family; 

• Greedy- Accept and Minimize- Flow Time heuristic in the Multi-Interval family. 

For all above settings, dull transport protocol is assumed, which uses and only uses reserved 
bandwidth. 

To simplify the discussion on the potential gain of increasing flexibility, we ideally assume 
that bulk data transfer requests arrive online according to a Poisson process with parameter 
A, all requests have the same volume v, R max — C/10 and R m in = C/20, where C is the 
link capacity. Observation in this simple setting also helps explain the system behavior in 
more general settings, which may have different arrival process, volume distribution, R m ax 

and R m in ' 

5.2 Single Link setting 

We first consider the case of single bottleneck link. Performance of above schemes is plotted 
under increasing load. 

Figure shows that in terms of blocking probability, FixTime-FixRate- R m in scheme 
performs better than FixTime-FixRate- R ma x scheme. When reservation rate decreases, two 
conflicting effects happen: On one hand, more requests can be accepted simultaneously; 
on the other hand, each request takes a longer time to finish, shows that decreasing 
reservation rates increase system's Erlang capacity, which is verified in this Figure. However, 
as FixTime-FixRate-R m in always conservatively reserve R m in, its request flow time is always 
Vr/Rmin- Contrarily, flow time of FixTime-FixRate-R m in scheme is aways v r /R max , which 
is only half of v r /R min under our simulation setting, as shown in FigureEl 

Exploiting the flexibility of selecting reservation rates, Threshold- FixTime-FlexRate scheme 
strikes a good balance between reducing blocking probability and minimizing mean flow 
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Rmin — B 
threshold 10% 4- 
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Multi-Interval — X 
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Figure 7: Blocking probability of reservation schemes 
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Figure 8: Mean flow time of reservation schemes 



time. When load is low, a new request reserves full rate Rmax, so that its flow time is 
minimized. Although the new request agressively seizes bandwidth, the threshold statis- 
tically ensures that there are still abundant bandwidth left. Thus the probability is low 
that in a near future coming flows are blocked due to this aggressive request. Instead, the 
new request exploits the resource which will otherwise be wasted, and also it is able to 
release network resource more quickly, which benefits the system at a middle-range time 
scale. In the lightly-loaded region Threshold-FixTime-FlexRate scheme performs similar to 



INRIA 



Flexible bandwidth reservation for bulk transfer 



17 



FixTime-FixRate-Rmax scheme. However when load increases, links are often run in satu- 
rated state, a new request has higher probability to find remained capacity below threshold. 
Thus in this region, Threshold- FixTime-FlexRate scheme automatically adapts its behav- 
ior to perform similar to FixTime-FixRate-R m i n . From the two figures, it is observed that 
Threshold- FixTime-FlexRate scheme has a much lower blocking probability than FixTime- 
FixRate-R m ax scheme, while has a much lower mean flow time than FixTime-FixRate-R m i n 
scheme. 

In this single link setting, behavior of selected FlexTime-FlexRate and Multi-Interval 
schemes are identical. This is an artificial result of the uniform volume and R ma x setting, 
as well as the integer value of C / R ma x ■ We also conduct extensive simulations over more 
general volume, R m ax and R m in distribution over a single link, and results also show that 
the performance of FlexTime-FlexRate and Multi-Interval remains close. Both FlexTime- 
FlexRate and Multi-Interval schemes perform much better than above three schemes in both 
blocking rate and flow time. 

A remarkable observation is that, FlexTime-FlexRate and Multi-Interval schemes with 
dull transport protocol even outperform the FixTime-FixRate-R m i n scheme equipped with 
ideal transport protocol, in terms of both blocking rate and flow time (see the Rmin + 
Ideal Transport Protocol curve in both Figure. In addition, the small flow time of Rmin 
+ Ideal Transport Protocol is achieved opportunistically by ideal transport protocol, which 
can not be guaranteed at the moment when the reservation is made (in contrast, FixTime- 
FixRate-Rmin scheme can only guarantee that accepted requests are completed before dead- 
line). Thus other co-allocated resources can not exploit the small flow time to increase their 
scheduling efficiency. On the other hand, the request flow time is known and guaranteed 
in reservation schemes at the moment when request is processed. This predictability can 
benefit other co-allocated resources. This result strongly motivates the study of advanced 
reservation schemes. 



RR n° 0123456789 



18 



B. Chen & P. Primet 



5.3 Grid network setting 

We also evaluate different schemes' performance in a network setting. We use the topology as 
shown in Figure El n ingress sites and n egress sites are interconnected by over-provisioned 
core networks. Each site composed of a cluster of grid nodes, and is connected to core 
network with a link of capacity C. The maximal aggregate bandwidth demands from the 
culster may exceed C, making these links potential bottlenecks. For simplicity, we assume 
that the core network is over-provisioned, like the visioned Grid5000 networks in France 
0. Core network can be provisioned, for example, using hose model [3|. When generating 
request, its source is randomly selected from ingress sites, then a random destination is 
selected independently among egress sites. All sites have the same probability to be chosen. 



Ingress sites I 




Egress sites E r x ' r 2 *r J 



Figure 9: Topology 

Figure ITfll and Figure ITTlnlot the performance when there are 10 ingress nodes and 10 
egress nodes in the network. Compared to Figure and Figure |H1 three phenomenons are 
observed: 

• Overall, performance of schemes degrades slightly; 

• FlexTime-FlexRate scheme's blocking probability shows a big increase, and its perfor- 
mance is no longer close to Multi-Interval scheme; 

• Multi-Interval scheme's mean flow time performance deteriorates obviously. 

The overall performance degration can be traced to the fact that reservation in a network 
need to consider multiple links (both ingress and egress link in this topology) . A reservation 
request is blocked or its flow time becomes longer when any one of them is congested. If 
we assume that congestion states in two links are independently and identically distributed, 
with mean congestion probability p, the probability that there is at least one of them being 
congested is 2p — p 2 > p. This intuitively explains the overall degration of performance. 
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Figure 11: Mean flow time of reservation schemes 



The performance degration of FlexTime-FlexRate scheme's blocking probability and 
Multi-Interval scheme's mean flow time can be explained using a simple example in Fig- 
ure El 

In this example, there are two ingress links and two egress links interconnected by over- 
provisioned core networks. Existing request r% reserves bandwidth in I\ and E\, while 
existing request r-i reserves bandwidth in I2 and E2 as shown in the Figure. At current 
system time, a new request r 3 arrives at I\ with destination E 2 - For the three FixTime 



RR n° 0123456789 



20 



B. Chen & P. Primet 



Ingress I, Egress E t 
















































Ingress l 2 Egress E 2 

Figure 12: A fragmentation example 

schemes (FixTime-FixRate-R max scheme, FixTime-FixRate-R m i n scheme and Threshold- 
FixTime-FlexRate scheme), they are not allowed to accept r 3 since bandwidth is fully re- 
served for the current time. This prevents fragmentation as shown in the Figure when 
both FlexTime-FlexRate scheme and Multi-Interval scheme exploit their flexibility to accept 
r^. This time-axis framentation increases FlexTime-FlexRate scheme's blocking probability, 
since FlexTime-FlexRate scheme can only allocate a continuous time interval. On the other 
hand, blocking rate of Multi-Interval scheme is not affected as much as FlexTime-FlexRate 
scheme because Multi-Interval scheme can make use of multiple (discontinuous) intervals. 
However Multi-Interval scheme's mean flow time is affected. 

In above examples, Multi-Interval schemes often give the best perfromance. However, 
using multiple intervals comes at a cost. Figure Il3l shows the increase trend of sub-interval 
number when network size is increased. It is shown that this number becomes quite stable 
around a small level, when the number of nodes grows larger than the multiplexing level of 
a single link, which is C/R max . This result holds for different load levels. This observation 
shows the feasibility of exploiting Multi-Interval scheme. 
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A--A--A-A-A- A- A--A-A--A-A--A--A- A -&-A.. 



number of in(e)gress nodes 

Figure 13: Mean number of intervals per flow in Multi-Interval scheme 

6 System architecture 

The logic framework shown in Figure corresponds to a centralized scheduler, which may 
not be desirable because: 

• links may be under control of different authorities; 

• when network size grows, the centralized scheduler itself may become a bottleneck; 

• Centralized scheduler presents an one-failure-point. 




Reply 




Decision 






D r (t) 



I 



I 



Scheduler Scheduler 




dest 



Figure 14: Distributed architecture 
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Thus we present a simple distributed architecutre as shown in FigurelT4l In this architec- 
ture, every bottleneck link is associated with a local bandwidth scheduler, which maintains 
the local reservation state. Request generated from the source first arrives at link L\, whose 
scheduler uses rain operation to combine its local link state constraint into the request 
specification. The updated request specification is forwards to the nexthop. In this way, 
the constraint function is updated hop by hop: C>(<) = (Li min C* _1 )(i). When request 
reaches the last hop L n , the constraint function C r (t) is completely constructed, and the 
scheduler in L n makes decision D r {t) based on C r (t). D r (t) is sent to destination, which 
may issue a confirmation. D r (t) is then sent through the same path back to source. D r (t) 
is kept unchanged along the path, and each hop uses D r (t) to update its local reservation 
state L n . 

Single out a local scheduler, its logic can still be interpreted using the logic framework of 
FigureEl The only difference is that for schedulers not in the last hop, their "func" operation 
is not a local operation but depends recursively on the next hop. 
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7 Related works 

Admission control and bandwidth reservation have been studied extensively in multimedia 
networking. A real-time flow normally requests a specified value of bandwidth. Existing 
reservation schemes such as RSVP attempt to reserve the specified bandwidth immedi- 
ately when request arrives. Reservation remains in effect for an indefinite duration until 
explicit "Teardown" signal is issued or soft state expires. No time-indexed reservation state 
is kept. 

Time-indexed reservation is needed when considering advance reservation of bandwidth 
[T5] . which allows requesting bandwidth before actual transfer is ready to happen. For exam- 
ple, a scheduled tele-conference may reserve bandwidth for a specified future time interval. 
0] shows that advance reservation causes bandwidth fragmentation in time axis, which may 
significantly reduce accept probability of requests arriving later. To address the problem, 
they propose the concept of malleable reservation, which defines advance reservation request 
with flexible start time and rate. 

Optimal control and their complexity is studied for different levels of flexibility. j2] stud- 
ies call admission control in a resource-sharing system, i.e. how to use the reject flexibility 
regarding different classes of traffic. Optimal policy structure is identified for some special 
case. 12j proved that in a network with multiple ingress and egress sites, off-line optimiza- 
tion of accept rate for uniform-volume uniform-rate requests with randomly specified life 
span is NP-complete. They also consider flexible tuning of reservation rate. studies the 
increase of Erlang capacity of a system by decreasing the service rate. In its essential, such 
service rate scaling is identical to the capacity scaling, which is studied by QH] an d [H] to 
approximate large loss networks. 

There is also a large literature of online job scheduling with deadline, for example, (Sj, 
[TT] . [T3| . A job monopolizes processor for the time it's being scheduled, which maps ex- 
actly to packet level scheduling, while in flow level, we must consider multiple flows share 
bandwidth concurrently, as represented by R m ax- 
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8 Conclusion 

In this paper, we study the bandwidth reservation problem for bulk data transfers in grid 
networks. We model grid networks as multiple sites interconnected by wide area networks 
with potential bottlenecks. Data transfer requests arrive online with specified volumes and 
deadlines, which allow more flexibility in reservation schemes design. We formalize a gen- 
eral non-preemptive reservation framework, and use simulation to examine the impact of 
feasibility over performance. We also propose a simple distributed architecture for the 
given framework. The increased flexibility can potentially improve system performance, but 
the enlarged design flexibility also raises new challenges to identify appropriate reservation 
schemes inside the solution space. 
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